Efficient Algorithms for Parsing the DOP Model? A Reply to Joshua Goodman

نویسنده

  • Rens Bod
چکیده

In the introduction of his paper, Goodman observes that the DOP model ".. can be summarized as a special kind of Stochastic Tree Substitution Grammar (STSG): given a bracketed, labeled training corpus, let every subtree of that corpus be an elementary tree, with a probability proportional to the number of occurrences of that subtree in the training corpus." Goodman then neglects to add that according to the DOP model, the "preferred" or "best" parse tree of a sentence is the most probable parse tree of that sentence. This definition is found in all my publications about DOP (e.g. Bod, 1992-96; van den Berg, Bod and Scha, 1994; Bod and Scha, 1994; Bod, Krauwer and Sima'an, 1994; Bod, Bonnema and Scha, 1996; Sima'an, Bod, Krauwer and Scha, 1994). Since in DOP, the probability of a tree is the sum of the probabilities of all distinct derivations that produce that tree, the computation of the most probable tree is very expensive. In Sima'an (1996b), a proof is given that the problem of computing the most probable tree of a sentence in DOP is NP-hard. This proof does not mean that there is no algorithm that can estimate the most probable tree of a sentence with an error that can be made arbitrarily small (cf. Bod, 1993b), but it does mean that there is no deterministic polynomial time algorithm for finding the most probable tree.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Algorithms for Parsing the DOP Model

Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the first p...

متن کامل

ar X iv : c m p - lg / 9 60 40 08 v 1 2 2 A pr 1 99 6 Efficient Algorithms for Parsing the DOP Model ∗

Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the first p...

متن کامل

Eecient Algorithms for Parsing the Dop Model

Excellent results have been reported for Data-Oriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and diicult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the rst prob...

متن کامل

Data-Oriented Parsing

1. A DOP model for phrase-structure trees R. Bod and R. Scha 2. Probability models for DOP R. Bonnema 3. Encoding frequency information in stochastic parsing models 1. Computational complexity of disambiguation under DOP K. Sima'an 2. Parsing DOP with Monte Carlo techniques J. Chappelier and M. Rajman 3. Towards efficient Monte Carlo parsing R. Bonnema 4. Efficient parsing of DOP with PCFG-redu...

متن کامل

A Consistent and Efficient Estimator for the Data-oriented Parsing Model

Given a sequence of samples from an unknown probability distribution, a statistical estimator aims at providing an approximate guess of the distribution by utilizing statistics from the samples. One desired property of an estimator is that its guess approaches the unknown distribution as the sample sequence grows large. Mathematically speaking, this property is called consistency. This thesis p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9605031  شماره 

صفحات  -

تاریخ انتشار 1996